nypd_shooting_df =
read_csv("data/nypd_shooting_data.csv") %>%
janitor::clean_names() %>%
separate(col = occur_date, into = c("month", "day", "year"), sep = "/") %>%
separate(col = occur_time, into = c("hour", "minute", "second"), sep = ":") %>%
mutate(across(where(is.character), tolower),
month = as.numeric(month),
month_name = recode(month, "1" = "january", "2" = "february", "3" = "march", "4" = "april", "5" = "may", "6" = "june", "7" = "july", "8" = "august", "9" = "september", "10" = "october", "11" = "november", "12" = "december"),
day = as.numeric(day),
year = as.numeric(year),
hour = as.numeric(hour),
minute = as.numeric(minute),
second = as.numeric(second),
minute_calc = hour * 60 + minute,
boro = as.factor(boro),
boro = fct_relevel(boro, "manhattan", "brooklyn", "bronx", "queens", "staten island")) %>%
select(incident_key, year, month_name, month, day, hour, minute, second, minute_calc, everything())
Here is a scatterplot of all shooting incidents using the latitude and longitude data. We overlaid these data points onto a map of NYC using the Leaflet library. Information on the month, year, borough, and the category of location (private house, public housing, restaurant, etc) is shown in the text box that appears when hovering over the plot.
pal <- colorFactor("viridis", nypd_shooting_df$year)
nypd_shooting_df %>%
mutate(
text_label = str_c(month_name, " ", year, ", ", boro, ", ", location_desc)) %>%
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .1, color = ~pal(year), label = ~text_label) %>%
addLegend("bottomright", pal = pal, values = ~year,
title = "year")
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198
We further explored the map by identifying hot spots of shootings within each region. These are indicated in the cluster map below. When you zoom in on the map, the clusters in each specific region become more granular. Go ahead, give it a try!
nypd_shooting_df %>%
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .25) %>%
addMarkers(
clusterOptions = markerClusterOptions())
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198
Brooklyn had the highest number of shootings across all boroughs, while Staten Island had the lowest number. We see that the number of shootings universally spiked in 2020. This is consistent with reports that violence erupted in the midst of the COVID-19 pandemic and social distancing measures, which coincided with a period of social unrest following the murder of George Floyd in Minneapolis.
boro_graph =
nypd_shooting_df %>%
group_by(year, boro) %>%
summarise(
count = n()) %>%
ggplot(aes(fill = boro, y = count, x = year)) +
geom_bar(position = "stack", stat = "identity") +
labs(title = "Number of Shootings by Boro from 2006-2021")
ggplotly(boro_graph)
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198
We wanted to see which location type was most prone to shootings. The table below shows that the following were the most common locations for shootings: multi-dwelling public housing and apartment buildings, private houses, grocery/bodegas, and bars/night clubs. Of note, 59.2% of entries were NA.
nypd_shooting_df %>%
na_if("none") %>%
group_by(location_desc) %>%
summarise(
count = n()) %>%
mutate(
percentage = count / sum(count) * 100,
percentage = round(percentage, digits = 1)) %>%
arrange(desc(percentage)) %>%
slice(1:10) %>%
drop_na(location_desc) %>%
knitr::kable()
| location_desc | count | percentage |
|---|---|---|
| multi dwell - public hous | 4559 | 17.8 |
| multi dwell - apt build | 2664 | 10.4 |
| pvt house | 893 | 3.5 |
| grocery/bodega | 622 | 2.4 |
| bar/night club | 588 | 2.3 |
| commercial bldg | 265 | 1.0 |
| restaurant/diner | 194 | 0.8 |
| beauty/nail salon | 105 | 0.4 |
| fast food | 99 | 0.4 |
Next, we wanted to further explore the top 5 shooting location types by borough. In all boroughs, multi-dwelling public housing sites had the highest proportion of shootings, and multi-dwelling apartment buildings had the second highest proportion of shootings. Bars and nightclubs more commonly had shooting incidents in Manhattan, Bronx, and Queens, but not in Brooklyn or Staten Island.
location_borough =
nypd_shooting_df %>%
na_if("none") %>%
group_by(boro, location_desc) %>%
summarise(
count = n()) %>%
mutate(
percentage = count / sum(count) * 100) %>%
arrange(desc(percentage)) %>%
slice(1:5) %>%
drop_na(location_desc) %>%
ggplot(aes(fill = location_desc, y = count, x = boro)) +
geom_bar(position = "stack", stat = "identity") +
labs(title = "Top 5 Shooting Location Types by Boro") +
theme(legend.position = "right")
ggplotly(location_borough)
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198